Failure Detection vs Group Membership in Fault-Tolerant Distributed Systems: Hidden Trade-Offs

نویسنده

  • André Schiper
چکیده

Failure detection and group membership are two important components of fault-tolerant distributed systems. Understanding their role is essential when developing efficient solutions, not only in failure-free runs, but also in runs in which processes do crash. While group membership provides consistent information about the status of processes in the system, failure detectors provide inconsistent information. This paper discusses the trade-offs related to the use of these two components, and clarifies their roles using three examples. The first example shows a case where group membership may favourably be replaced by a failure detection mechanism. The second example illustrates a case where group membership is mandatory. Finally, the third example shows a case where neither group membership nor failure detectors are needed (they may be replaced by weak ordering oracles).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Node Failure Detection and Membership in CANELy

Fault-tolerant distributed systems based on fieldbuses may benefit to a great extent from the availability of semantically rich communication services, such as those provided by group communication, clock synchronization, membership and failure detection. This is specially true of distributed critical control applications. However, the migration of those services to the realm of simple fieldbus...

متن کامل

Failure Detection in Asynchronous Distributed Systems

Being able to detect failures is an important issue in designing fault-tolerant distributed systems. However, the actual behaviour of a system limits the ability to provide such a mechanism. From one extreme of the spectrum, synchronous systems (i.e., with bounded message transmission delay and processing times) allow for the construction of perfect failure detection based simply on local timeo...

متن کامل

Comparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms

Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we present a performance evaluation methodology that can be generalized to analyze many kinds of fault-tolerant alg...

متن کامل

A Leader Election Protocol for Fault Recovery in Asynchronous Fully-Connected Networks

We introduce a new algorithm for consistent failure detection in asynchronous systems. Informally, consistent failure detection requires processes in a distributed system to distinguish between two diierent populations: a fault free population and a faulty one. The major contribution of this paper is in combining ideas from group membership and leader election, in order to have an election prot...

متن کامل

Architecture and Protocols for Fault-Tolerant Distributed Objects

We present an architecture and protocols for distributed fault-tolerant objects. The architecture and protocols are based on a novel object-centric formulation of the reliable group multicast problem which we refer to as the Group State Synchronization Problem (GSSP). This formulation allows us to capture and represent fault-tolerance semantics at the object level. The GSSP formulation has thre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002